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ABSTRACT We use the Minority Game and some of its variants to show 
how efficiency depends on learning in models of agents competing for lim- 
ited resources. Exact results from statistical physics give a clear under- 
standing of the phenomenology, and opens the way to the study of reverse 
problems. What agents can optimize and how well is discussed in details. 

Designed a simplification of Arthur's El Farol bar problem 1., the Mi- 
nority Game p| 13' provides a natural framework for studying how selfish 
adaptive agents can cope with competition. The major contribution of the 
Minority Game is not only to symmetrize the problem, which physicists like 
very much, but also to introduce a well parametrized set of strategies, and 
more generally to provide a well defined and workable family of models. 

In this game, N agents have to choose one between two choices at each 
time step; those who are in the minority win, the other lose. Obviously, it is 
easier to loose than to win, as the number of winners cannot exceed that of 
the losers. If the game is played once, only a random choice is reasonable, 
according to Game Theory 4 . When the game is repeated, it is sensible to 
suppose that agents will try to learn from the past in order to outperform 
the other agents, hence, the question of learning arises, as the minority 
mechanism entails a never-ending competition. 

Let me first introduce the game and the needed formalism. There are N 
agents, agent i taking action a.i e { — 1, +!}• A game master aggregates the 
individual actions into A = '^i gives private payoffs —aig{A) to 

each agent i = 1, • • • , TV. The minority structure of the game implies that 
g must be an odd function of A. The simplest choice for g may seem to 
be g{A) = sgn(A), but a linear function is better suited to mathematical 
analysis. The MG is a negative sum game, as the total payoff given to 
the agents, is — J2iLi c^ig{A) — ~g{A)A < 0, since g is an odd function. In 
particular, the linear payoff function gives a total loss of A"^; when the game 
is repeated, the average total loss is nothing else than the fluctuations of 
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the attendance = (vl^) where the average is over time. 

From the point of view of the agents, it is a measure of payoff wastage. 
That is why many papers on the MG consider it as the global utility of the 
system (world utility hereafter), and try, of course to minimize it (forward 
problem). I shall review the quest for small a^, focusing on exact results, 
and show that all proposed mechanisms lead essentially to the same re- 
sults.^ A particular emphasis will be put on inductive behavior, as it gives 
rise to particularly rich phenomenology while being well understood. Fi- 
nally, the reverse problem is addressed, by deriving what private payoff 
function g to use given a world utility W to minimize. 



1 No public information 

1.1 "If it ain't broke, don't fix it" 

The arguably simplest behavior is the following |S] : if agent i wins at time 
t, she sticks to her choice ai{t) until she looses, when she takes the opposite 
choice with probability p. The dynamics is Markovian, thus can be solved 
exactly [S]. When N is large, the fluctuations cr^ are of order (pN)^: indeed, 
as the number of losers is ~ N, the average number of agents changing their 
minds at time t is ^ pN. Therefore, one can distinguish three regimes 

• pN ~ X ~ est; this leads to small fluctuations = 1 + 4.t(1 + x/3), 
which tend to the absolute minimum ~ 1 when a; — s- 0. The time 
needed to reach the stationary state is typically of order ViV. 

• p ~ 1/VN; this yields ^ N, which is the order of magnitude of 
produced by agents making independent choices. 

• pN 3> 1. In this case, a finite fraction of agents change their mind at 
each time step and = iV(iVp2 + 4(1 _ p))/{2 - pf ~ iV^ 

The major problem here is that p needs to be tuned in order to reach 
high efficiency. But it is very easy to design a feedback from the fluctuations 
on p that lowers p as long as the fluctuations are too high, and to 
use the above results in order to relate the fluctuations to p{t — > 00). 
Mathematically, this amounts to take pit = 0) = 1, dp/dt = — f{p, N,t). 
For instance, f{t) = t~^ seems appropriate as long as (3 is small enough. 
Note that p{t) as t — s- 00, in words, the system eventually freezes. From 
the optimization point of view, this is a welcome, but as for agents, complete 
freezing, although being a Nash equilibrium , is not satisfactory, as it may 
be better for an agent sitting on the losing side to provoke an game-quake 



^Evolutionary models (see for instance |2]|3inil3) ^^e very different in nature, and 
axe not reviewed here, mostly because they are not exactly solvable. 
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and to profit from a re-arrangement of the winners/losers. Therefore, an 
unanswered question is where to stop the time evolution of p. 

Nevertheless, this simple example illustrates well what happens in MGs: 
the efheiency essentially depends on the opinion switching rate, which itself 
depends on the learning rate. It has to be small in order to reach good level 
of efficiency. 

1.2 Inductive behavior 

Inductive behavior can remedy the problems of the previous learning scheme 
if, as we shall see, agents know the nature of the game that they are playing. 
This subsection is a simplified version of the simplest setting for inductive 
agents of ref. [TU]. At time t, each agent i ~ \, - ■ ■ ,N plays +1 with proba- 
bility 'Ki{t), and —1 with probability 1 — 7ri(t). Learning consists in changing 
TTi given the outcome of the game at time t. For this purpose, each agent i 
has a numerical register [t) which reflects her perception at time t of the 
relative success of action +1 versus action —1. In other words, Ai(t) > 
means that she believes that action -1-1 has been more successful than —1. 
The idea is the following: if agent i observes A{t) < she will increase Ai 
and hence her probability of playing ai = -f 1 at the next time step. Rein- 
forcement here means that tt^ is an increasing function of A^. For reasons 
that will become obvious later, it is advisable to take tt^ = (1 -I- TOj)/2 and 
rrii — x(Aj)/2, where x is an increasing function and x(±oo) — ±1. The 
way in which Ai(t) is updated is the last and most crucial element of the 
learning dynamics to be specified: 

^,{t + l)^^,(t)-^[A{t)~l^adt)]. (1) 

The rj term above describe the fact that agent i may account for her own 
contribution to A{t). When rj — she believes that A{t) is an external 
process on which she has no influence, or does not know what kind of 
game she is playing. She may be called naive with this respect. For rj = 1, 
agent i considers only the behavior of other agents A^i{t) ~ A{t) — ai{t) 
and does not react to her own action ai{t). As wc shall see, this subtlety 
is the key to high efficiency. The private utility of sophisticated agents 
corresponds more or less to what is called Aristocrat Utility (AU) in COIN's 
nomenclature [Tl) . 

Naive agents 77 = 

It is possible to show that agents minimize the predictability H = (A)'^ . As 
a consequence H vanishes in the f — > 00 limit. There are of course many 
states with H = Q and the dynamics selects that which is the "closest" to 
the initial condition. To be more precise, let A,;(0) be the initial condition 
(which encodes the a priori beliefs of agent i on which action is the best 
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one). As i ^ oo, {A)t ~ J2i "^i(^) ~^ and converges to 

/>oo 

A,(oo) = A,(0) + M, with SA^ dt{A)t. (2) 

^0 

The condition {A)oo = provides an equation for 5 A 

N 

= 5]x(A»(0) + M). (3) 

i=i 

By the monotonicity property of x, this equation has one and only one 
solution. 

The asymptotic state of this dynamics is information-efficient {H = 0), 
but it is not optimal, as, in general, this state is not a Nash equilibrium. The 
fluctuations are indeed determined by the behavior of x(a;). This is best 
seen with a particular example: assume that the agents behave according 
to a Logit model of discrete choice jlSj where the probability of choice a 
is proportional to the exponential of the "score" Ua of that choice: 7r(a) oc 
gr(7„/2^ With only two choices a = ±1, 7r(a) = (l+am)/2 and A = U+-U^, 
we obtain ^ 

X(A) = tanh(rA), Vi. (4) 

Here F is the learning rate, which measures the scale of the reaction in 
agent's behavior (i.e. in rrii) to a change in A^ ^1]. We also assume that 
agents have no prior beliefs: Aj(0) = 0. Hence Ai{t) = y{t)/T is the same 
for all agents. From the results discussed above, we expect, in this case the 
system to converge to the symmetric Nash equilibrium = for all i. 
This is not going to be true if agents are too reactive, i.e. if F > T^- Indeed, 
y{t) — VAi{t) satisfies the equation 

F ^ 

y{t + l) = y{t)--Y^adt) 

1=1 

~ 2;(t)-Ftanh[y(t)] (5) 

where the approximation in the last equation relies on the law of large 
numbers for N ^ 1. Eq. |SJ| is a dynamical system. The point = is 
stationary, but it is easy to see that it is only stable for F < Fj, = 2. For 
F > 2, a cycle of period 2 arises, as shown in Fig. ^ This has dramatic 
effects on the optimality of the system. Indeed, let ±y* be the two values 
taken by y{t) in this cycle'^. Since y{t + 1) = ^y{t) — ±y* we still have 
(A) = and hence H =^Q. On the other hand ct^ = TV^y*^ is of order iV^, 
which is even worse than the symmetric Nash equilibrium tt.^ = 1/2 for all 
i, where — N. 



■^This learning model has been introduced by [131 in the context of the MG. 
■^ity* are the two non-zero solutions of 2y = rtanh(i/). 
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FIGURE 1. Graphical iteration of the map y{t) for T = 1.8 < Vc and T = 2.5 > Fc 

Hence, one finds again a transition from cx iV to cx N'^ when the 
learning rate is too large. 

Sophisticated agents r/ > 

It is easy to check that with ry > 0, following the same steps as in the 
previous section, the learning dynamics of agents minimize the function 

N 

H,^{Ar-r^Y.^l (6) 

1=1 

Since TJ,, is a harmonic function, Hri attains its minima on the boundary 
of the domain [—1, 1]^. In other words, = ±1 for all i which means that 
agents play pure strategies — rrii. The stable states are optimal Nash 
equilibria for N even. By playing pure strategies agents minimize the second 
term of H,y Of all corner states where mf = 1 for all i, agents select those 
with (A) = by dividing into two equal groups playing opposite actions. 
All these states have minimal "energy" Hjj = —N-q. Which of these states 
is selected depends on the initial conditions Ai(0), but this has no influence 
on the outcome, since {A) = 0. 

Note that the set of stable states is disconnected. Each state has its basin 
of attraction in the space of Ai(0). The stable state changes discontinuously 
as Ai(0) is varied. This contrasts with the case 77 = where Eq. Q implies 
that the stationary state changes continuously with Aj(0) and the set of 
stationary states is connected. 
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For N odd, similar conclusions can be found. This can be understood 
by adding a further agent to a state with — 1 (even) agents in a Nash 
equilibrium. Then = (1 — ?7)m^, so for i] < 1 the new agent will play a 
mixed strategy rrii — 0, whereas for 77 > 1 it will play a pure strategy. In 
both cases other agents have no incentive to change their position. In this 
case we find cr^ < 1. 

It is remarkable how the addition of the parameter r/ radically changes 
the nature of the stationary state. Most strikingly, fluctuations are reduced 
by a factor N. From a design point of view, this means that one has either 
to give a personalized feedback to autonomous agents, or to make them 
more sophisticated, for instance because they need to know the functional 
form of the payoff. 



2 Public information 

As each agent has an influence on the outcome of the game, the behavior 
of particular agent may introduce patterns that the other agents will try 
to exploit. For instance, if only one agent begins to think that the outcome 
of next game depends on some external state, such as the present weather 
of Oxford, and behave accordingly, then indeed, the outcome will depend 
on it.** But this means that other agents can exploit this new pattern by 
behaving conditionally on the same state. One example of public informa- 
tion state family that agents may consider as relevant is the past winning 
choices, for instance a window of size M of past outcomes Each such 
state can be represented by a bit-string of size M, hence there are 2*'^ 
possible states of the world. This kind of state has a dynamics of its own: 
it diffuses on a so called De Bruijn Graph jj^l- Another state dynamics 
consists simply in drawing at random the state at time t from some ensem- 
ble of size P (e.g. P — 2*^). All exact results below are obtained with 
this setup. 

2.1 Neural Networks 

Two types of neural networks have been studied in the context of the MG 
|17l I18| [T??j . Beyond the mere academic question of how well or badly they 
can perform, it is worth noting that these papers were interested for the 
first time in interacting neural networks. 

Refs [171 118| introduced simple perceptrons playing the minority game. 
Each perceptron i = 1, • • • , TV is made up of M weights Wi = {wl, • • • , wf^) 
which are drawn at random before the game begins. The decision of network 



*This kind of self-fulfilled prophecy is found for instance in financial markets, where 
it is called 'sunspot effect'. 
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i is ai — sgn(w.fl), where fl is the vector containing the M last minority 
signs. The payoff was chosen to be — a^sgn {A). Neural networfcs are trained 
following the usual Hebbian rule, that is, 

m,{t+l)^m{t)~^fItagn{At). (7) 

Under some simplifying assumptions, it is possible to find that the fluctu- 
ations are given by ^1 

,2_.r , .U.r rJ. 2_if-l/(iV-l) 



+ N{N - 1) 1 - - arccos -f-^— (8) 

TT iV + 1 



where K = ^1 + y/l + ^^^^r;^ J • The best efficiency, obtained in the 
limit ^ 0, is given by 

(9) 

This means that the fluctuations are at best of order N, and at worst of 
order when the learning rate is too high. This is likely to be corrected 
for neural networks with sophisticated private utility. 



2.2 Inductive behavior 

El Farol's problem was introduced with public information and inductive 
behavior ^J, but with no precise characterization of the strategy space. In 
most MG-inspired models, a strategy is a lookup table a, or a map, or a 
function, which predicts the next outcome a'' for each state /z, and whose 
entries are fixed for the whole duration of the game. Each agent i has a set 
of S strategies, say S = 2 (a^^i and ai^2), and use them essentially in the 
same way as before 

Naive agents 

To each of her strategies, agent i associate a score Ui^s which evolves ac- 
cording to 

Uat + ^) = UUt)-at'^9[m] (10) 

Since we consider S = 2, only the difference between — Ui^2 — 
matters, and 

A,(t + 1) = A,(i) - (ag) - a^f^)5[A(i)] (11) 

Note that now A^ encodes the perception of the relative performance of 
the two strategies of agent i, A^ > meaning that the agent i thinks that 
strategy 2 is better than strategy 1, and rrii is the frequency of use of 
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strategy 2. As before, we consider x(a;) = tanh(ra;). This kind of agents 
minimizes the predictabihty, which has now to be averaged over the pubhc 
information states 

^=pE(^lM>' = (^ (12) 

where Q = ^ useful shortcut for the average over the states of 

the world. In contrast with the case with no information, H is not always 
canceled by the agents. This is due to the fact that the agents are faced to 
P possible states, but their control over their behavior is limited: when they 
switch from one strategy to another, they change their behavior potentially 
for all states. In fact all macroscopic quantities such as H/N and jN 
depend of the ratio a — P/N [201 1211 1^ . which is therefore the control 
parameter of the system. Solving this model is much more complex and 
requires tools of Statistical Physics of disordered systems The resulting 
picture is that for infinite system size (P, N ~> oo with P/N = a = est) [35] 
(see also Fig 12)), 

• H > if a = N/P > ac = 0.3374 .... In this region, the system 
is not informationally efhcient. It tends to a stationary state which 
is unique and stable, and does not depend either on T or on initial 
conditions. F is a time scale |14| . 

9 H = when a < ac- Since agents succeed in minimizing H, the ques- 
tion for them is what should they do? They do not known, and as 
a result, the dynamics of the system is very complex: it depends on 
initial conditions^ [TUl El Ei |^ , and on F [HI El gS] . Any value of 
the fluctuations can be obtained, from cr^ = 1 for very heterogeneous 
initial conditions Ai{t — 0) to ^ N"^ for F = oo and homoge- 
neous initial conditions, including ^ N for F = and any initial 
conditions. Two alternative theories have been proposed, one which 
is exact, but which has to be iterated and another one which 
rests on a closed form for the fluctuations Iterating the exact 
theory is hard, since the i— th iteration is obtained by inverting t y, t 
matrices, and one has to average of several realizations. Nevertheless, 
a hundred numerical iterations bring promising results |27j . 

The origin of the phase transition can easily be understood in terms of 
linear algebra: canceling H = means that = for all ii. This is 

nothing else than a set of P linear equations of N variables {m^}. As the 
variables are bounded (0 < mf < 1), one needs more that P of them, 
N = P/ac > P to be precise [2H1- 

In fact, the transition from low to high (anomalous) fluctuations does 
not occur at ac for finite system size as it clearly appears on Fig El This 



^Physicists say that it is not ergodic 
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can be traced back to a signal to noise ratio transition the system is 
dynamically stable in the phase of > as long as the signal to noise 
ratio H/u'^ is larger than K/^/P for some constant K. This transition 
is universal for naive competing agents. Hence in this kind of interacting 
agents systems, the ultimate cause of large fluctuations is this signal-to- 
noise transition and high learning rate. Sophisticated agents are not affected 
by this problem, as explained below. 

Sophisticated agents 

As before, a sophisticated agent is able to disentangle her own contribution 
from g{A). Eq ^ becomes [121301: 

A,(t + 1) =^ A,(t) - - a>t^^)g{A{t) - a,{t)) (13) 

When the payoff is linear g{A) — A, the agents also minimize the fluctu- 
ations (T^ = (^^)- Similarly, they end up using only one strategy, which 
implies that H = <t^. In this case, they cannot cancel A for all /i at the 
same time, hence a'^/N > 0. How to solve exactly this case is known in 
principle \'2'2i I30|. 'In principle' here means that the minimization of cr^ 
is hard from an technical point of view; how much harder is also a ques- 
tion hard to answer. A first step was done in ref. |31| . which is able to 
describe reasonably well the behavior of the system. Interestingly, in this 
case the signal-to-noise ratio transition does not exist, as the signal is also 
the noise {H = a^), hence, there is no high volatility region (see Fig. 0). 
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FIGURE 3. Fluctuations produced by sophisticated agents P = 64, NIT = lOOP, 
average over 100 realizations. 

Therefore, the fluctuations are again considerably reduced by introducing 
sophisticated agents. An important point here is that the number of stable 
final states {m^} |2P| grows exponentially when N increases. Which one 
is selected depends on the initial conditions, but the efficiency of the final 
state greatly fluctuates. As the agents (and the programmer) have no clue 
of which one to select, the system ends up having non-optimal fluctuations 
of order N, as seen of Fig. |21 



3 Forward/reverse problems 

Inductive agents minimize a world utility whose determination is the first 
step in solving the forward problem. Finding analytically its minimum is 
then possible in principle thanks to methods of Statistical Physics 23 . The 
reverse problem consists in starting from a world utility W and finding the 
appropriate private payoff. 

3.1 Naive agents 

The case with no information {P = 1) is trivial, since (A) — in the 
stationary state, hence all functions = (^)^" {n integer) are minimized 
by a linear payoff. When the agents have access to public information (P > 
1), the world utility W given any private payoff function g{A) is pS] 

W^naive(W) = -^E ^e'^^/^ G ( W) + X x/^) (14) 

LL — 1 
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where g{x) — dG(a:)/da; and D = —H = {N — J^i "^|)/2. In other words, 
the agents select the set of strategy usage frequencies {rrii} that minimizes 
U. The final state is unique and does not depend on the initial conditions. 
Note that in Eq (|14|l . only powers of (/i = 1, • • • , P) appear, which 

means that naive agents are only able to minimize world utihty that only 
depend on these quantities. This implies that a phase transition always 
happen if the agents are naive, and even more, that it always happen at 
the same ac = 0.3374 . . ., as seen conjecture from numerical simulations 
in [32] ■ As explained above, ac is the point where it is algebraically possible 
to cancel all |^. The above theory also means that the stationary 

state depends only weakly on the payoff, which can be seen numerically by 
comparing the of a given set of agents for different payoffs. 

The reverse problem is now to find g given W. Let us focus on the 
particular example W = (A)^", {n integer). First, one determines the world 
utility VF'^'^-' associated with g{x) = 2kx^''~^ , where k is an integer, 

(15) 

1=0 ^ 

where Xi = J exp(— a;^/2)a;'/'\/27r is the Z-th moment of a Gaussian dis- 
tribution of unitary variance and zero average, and H21 = (A)'^'- is the 
norm of the vector ({A\fi)). Suppose now that one wishes to minimize 
W = H2n- This can be done in principle with a linear combination of the 

W = = = E«^-E 'P'-'^2(.-o^2., (16) 



fc=0 fc=0 1=0 



The condition on the {ak} is that the coefficient of H2k be for k = 
0, • • • , n — 1, and the coefficient of H2n be 1, that is 

E a™ (27) =0 1 < fc < n - 1 (17) 

and On = 1. Then the problem is solved by finding the solution of these n— 1 
linear equations of a/c, fc = 1, • • • , n — 1, and taking g{x) = X]fc=i akx'^'^~^ ■ 
Note that the set of the problems that naive agents can solve is of limited 
practical interest. 



3.2 Sophisticated agents 
Sophisticated agents have instead 

(18) 
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where D^i ~ [{N — 1) — X^j^^i ™f]/2- This case is much simpler than the 
previous one, as all agents end up playing only one strategy 22 1 that is, 
= 0. Therefore, in this case, if g{A) = 2fc^2fc+i^ 

^(2fc) ^ ^^2fc^,^ (^g) 

Interestingly, similar functions are well-studied in Statistical Physics, 
where they usually represent the energy of interacting magnetic moments 
called "spins" a (classical) spin can have two values —1 or +1, which is 
the equivalent of choosing strategy 1 or 2. A well-known qualitative change 
occurs between k = 2 and k > 2, where the mathematical minimization of 
W is somehow less problematic; this may also be the case in such MGs. 
The final state is not unique, and depends on initial conditions, implying 
that agents cannot are not particularly good at minimizing such functions. 



3.3 Example: agent-based optimization 

Some optimization problems are so hard to solve that they have a name: 
they are hard, NP-hard '31' . There is no algorithm that can find the opti- 
mum of this kind of problems in polynomial time. One of them consists in 
finding amongst N either analogic or binary components the combination 
that is the least defective Z5 ' in the problem with analogic components, 
one has a set of N measuring devices; instead of A, each of them records 
the wrong value A + Ui with a constant bias a^, i = 1, • • • , TV, drawn from 
a given probability function. The problem is to find a subset such that the 
average bias 

£{n^} = — — jv (20) 

is minimal. Here = 0, 1 depending on whether component i is included in 
the subset. Statistical Physics shows that (eopt) C2^^/\/]V for large N, 
with C ~ 4.6 (the average is over the samples). In order to find the optimal 
subset, one cannot do better than enumerating all the 2^ possibilities. This 
makes it hard to tackle such problems for N larger than 40 with nowadays 
computers. Agent-based optimization on the other hand needs typically 
0{N) iterations and can be used with much larger samples. It is clear that 
one cannot expect this method to perform as well as the enumeration, 
still how well it perform as a function of the setup is a valuable question. 
Ref [37] compares a set of private payoffs and concludes that agent-based 
optimization is better than simulated annealing for short times and large 
samples, provided that the agents' private utility is "aristocratic" . 

Optimizing h = \ n-iO-i\ ^nd then dividing by the number of com- 

ponents used in the chosen subset leads to almost optimal subsets jH5| . 
Hence, we can use sophisticated MG- agents in order to optimize h'^ |38| . 
which plays the role of the fiuctuations in the MG. The most straightfor- 
ward application of the MG is to give two devices to each agents, which are 
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FIGURE 4. Average error e versus the size A'" of the defective component set for 
MG with S = 2 (circles), and S = 1 (squares), S — 2 with removal (stars) and 
5* = 1 with removal (full squares. 500N iterations per run, averages over 1000 
samples. 



their strategies. Each agent ends up playing with only one strategy. This 
setup constraints the use of N/2 devices in the optimal subset, and gives 
an error of order N~^-^, to be compared with the exponential decay of the 
optimal average error eopt- One can unconstrain the agents by giving only 
one component to each agent, and letting them decide whether to include 
their components or not into e, making the game 'grand canonical' | 39II4(J| . 
This is achieved by the following score evolution 

U^it + 1) = U,{t) - a,[A ~ n,{t)a,] (21) 

and ni{t) — ld[Ui{t)]. The —riiai term makes the agents sophisticated. This 
gives similar results as those of ref [23, as indeed the Aristocrat Utility 
is essentially the same concept as sophisticated agents. But in any case, it 
minimizes the fluctuations, but does not optimize them. The resulting error 
e is much better with S — 1 than with S* = 2: it decays ~ iV~^ (Fig l3.H|l . 
Therefore, as in the optimal case, unconstraining the problem by not fixing 
the number of selected components leads to much better efficiency. 

At this stage, one can improve substantially the error, still remaining in 
the 0{N) complexity regime. First, since the agents update their behavior 
simultaneously, they may be unable to distinguish whether removing only 
one component improves the error. We can do it by hand at the end of the 
simulations, repeatedly. This is a kind of greedy algorithm. On average, 
about 1.5 components are removed. In both the S = 2 and S = 1 cases, 
this results into a large improvement (see Fig. l3.3|l . and curioulsy produces 
the same error, with a decay ~ 7V~^-^. Nevertheless, the final error is still 
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FIGURE 5. Average error e versus the number of runs for each sample of the 
defective component set for MG with S = 2 (circles), and S — 1 (squares), 5* = 2 
with removal (stars) and S = 1 with removal (full squares). Left panel: A*' = 20, 
averages over 1000 samples; right panel: N — 50, average over 200 samples. 



far from optimality. This illustrates how hard this optimization problem 
is. Much better results can be obtained by removing a group of 2, or 3 
components, ad libitum, but of course, this needs much more computing 
resources {0{N'^), 0{N^), ■ ■ ■), and eventually amounts to enumerating all 
possibilities. 

Here is the second trick that keeps the complexity with the 0{N) regime. 
As mentioned, the final state depends on the initial conditions, and is often 
not optimal or not even near optimal. But it is still a local minimum of . 
Therefore the idea is to do T runs with the same set of defective devices, 
changing the initial condition Ui{t = 0), and to select the best run. It is a 
kind of simulated annealing [21] with zero temperature, or partial enumera- 
tion where repetition would be allowed. Interestingly, FigureElreports that 
the decay is apparently a power-law first, and then begins to saturate. For 
S = 1, the exponent is about —0.5, and 0.4 for S ~ 2; it depends weakly on 
TV. Remarkably, the error decreases faster with = 1 agents than = 2. 
Note that the optimal value is at about 10"^, hence, agents are far from it. 
This is due to the fact that the agents use too many components. Neverthe- 
less, the improvement brought by this methods is impressive, and increases 
as TV increases, but cannot keep up with the exponential decay of eopt- 
the difference becomes more and more abysmal. The component removal 
further lowers the error (same figure), and more in the S = 2 that in the 
S — 1 one. This advantage is reversed for T large enough when N is larger, 
as reported by the right panel Fig. (SJ). 

The other optimization problem recycles binary components |35| : one 
has a set of N partially defective processors, each of them able to perform 
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FIGURE 6. Fraction of samples for which a perfectly working subset of compo- 
nents can be found. / = 0.2, average over 1000 runs. 



P different operations. The manufacturing process is supposed to be fault 
with probability / for each operation of each component. Mathematically, 
the operation n of processor a is permanently defective (a'' = —1) with 
probability / and works permanently with probability 1 — / (a^ = 1). The 
probability that a component is working becomes vanishingly small when 
P grows at fixed /. The task consists in finding a subset such that the 
majority of its components gives the right answer, that is, 

N 

n.a'i > for all /i = 1, • • • , P (22) 

1=1 

Surprisingly, the fraction of samples in which a perfectly working subset 
of components can be found increases very quickly as N grows at fixed 
P and / 1221 (see also fig. EJ. Finding a subset that perfectly works is 
an easy problem when it is possible, but finding the one which has the 
least components is a hard problem j35| . By contrast with the minimiza- 
tion of fluctuations, here one wishes to maximize A given /x, that is, the 
predictability H. Since all the agents eventually use only one strategy in 
majority games [21], H = a^, hence, the fluctuations are also maxi- 
mized: naive agents are also sophisticated in this case. A simple majority 
game does not favor any particular sign of a priori. However, if / <C 1/2 
the sign hence mostly working combinations, are favored. In practice, 
a majority game payoff increase is ag{A) instead of —ag{A) as in minority 
games, which means that here one has 



U,{t + l) = U,{t)+af'^Ait) 



(23) 
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Majority games with S = 1 turn out to be better than those agents with 
iS* = 2, as shown in Fig. |H| where the results of enumeration are also dis- 
played. As the problem to find a working subset is easy for N large enough, 
the agents are successful. 



4 Conclusions 

The efficiency of Minority Games seems to be universal with respect to 
agents' learning rate: if the latter is too high, anomalous fluctuations, hence 
small efficiency arise. However, these are totally suppressed if the agents are 
sophisticated, who can optimally coordinate if there is no public informa- 
tion. An unexplored issue is what happens with neural networks taking into 
account their impact on the game. Based on this 'universality', it would be 
tempting to study neural networks with the sophisticated payoff. 

The study of forward/reverse problems showed the limitations of agent- 
based optimization in hard cases, which leaves the interesting open question 
of how to improve the overall performance, and how the setup of agent- 
based models can and must be tuned for individual cases. 

I am grateful to J. -P. Garrahan, N. F. Johnson, M. Marsili, D. Sher- 
rington and Yi-Cheng Zhang for numerous discussions. This work has been 
supported by EPSRC. 
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